Seeking for Simplicity in Complex Networks, and Its Consequences for Cascade 

Failures 
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Complex networks can be understood as graphs whose connectivity deviates from those of regular 
or near-regular graphs, which are understood as being 'simple'. While a great deal of the attention so 
far dedicated to complex networks has been duly driven by the 'complex' nature of these structures, 
in this work we address the identification of simplicity, in the sense of regularity, in complex net- 
works. The basic idea is to seek for subgraphs exhibiting small dispersion (e.g. standard deviation 
or entropy) of local measurements such as the node degree and clustering coefficient. This approach 
paves the way for the identification of subgraphs (patches) with nearly uniform connectivity, there- 
fore complementing the characterization of the complexity of networks. We also performed analysis 
of cascade failures, revealing that the removal of vertices in 'simple' regions results in smaller damage 
to the network structure than the removal of vertices in the heterogeneous regions. We illustrate the 
potential of the proposed methodology with respect to four theoretical models as well as protein- 
protein interaction networks of three different species. Our results suggest that the simplicity of 
protein interaction grows as the result of natural selection. This increase in simplicity makes these 
networks more robust to cascade failures. 

PACS numbers: 89.75.Hc,89.75.-k,89.75.Kd 



The rise of the complex networks research area was 
ultimately motivated by the finding that graphs ob- 
tained from natural or human-made structures tended 
to present intricate connectivity when compared to regu- 
lar (e.g. lattices and meshes) or nearly-regular graphs 
(e.g. Flory and Erdos-Renyi networks). Given their 
structured connectivity, complex networks have provided 
excellent models for several 'complex' real-world systems 
ranging from the Internet to protein-protein interaction 
(e.g. [J 0|). Yet, the dicotomy between simplicity and 
complexity continues to provide the motivation for re- 
lated investigations. 

The purpose of the present work is precisely to develop 
means to identify simplicity and regularity in complex 
networks in order not only to obtain better characteriza- 
tion of such structures, but also to investigate how such 
features can affect respective dynamics such as cascade 
failures. More specifically, the motivation for finding reg- 
ularities in complex networks includes but is not Hmited 
to the following issues: (1) the properties and distribu- 
tion of regular patches can help the characterization, un- 
derstanding and modeling of complex networks; (2) the 
presence of regular patches in a complex network may 
suggest a hybrid nature of that network (e.g. a combi- 
nation of the regular and scale-free paradigms) ; and (3) 
regular patches can have strong infiuence on dynamical 
processes taking place on the network. 

In graph theory, a regular graph is characterized by 
having all its nodes with exactly the same degree. How- 
ever, such a definition is limited in the sense that it is 
still possible to obtain a large diversity of such 'regu- 
lar' graphs. Figure [U presents three examples of 'regular' 
graphs. All nodes in these graphs exhibit the same de- 
gree, equal to four. While the structures in (a) and (c) 



have more 'regular' connectivity, the graph in (b) looks 
rather irregular. Actually, even the structure in (c) has 
irregularities if we consider measurements such as the in- 
dividual clustering coefficient. It is clear from such an 
example that identical node degree is not enough to en- 
sure uniform connectivity. 




FIG. 1: Three examples of 'regular' graphs, with all nodes 
with degree equal to 4. 



As we are interested in characterizing and identifying 
regularity (i.e. 'simpHcity') in complex networks, the first 
important step consists of stating clearly what is meant 
by regularity, uniformity or simplicity. Perhaps the most 
strict imposition of regularity on a network is that every 
node would be undistinguishable from any other node 
whatever the considered measurements. Therefore, iden- 
tical measurements (e.g. node degree, clustering coeffi- 
cient, hierarchical degrees, etc.) would be obtained for 
any of the nodes in the perfectly regular network. In this 
sense, regularity becomes closely related to symmetries 
in the networks. 

However, as we want to achieve increased fiexibility, in 
this work we relax the above definition and propose that 
a graph (or subgraph) is regular whenever all its nodes 
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present similar values for a set of measurements. There- 
fore, this definition is relative to the allowed degree of 
measurements dispersion, as well as the selection of the 
own set of measurements. Interestingly, traditional regu- 
larity in graph theory can be understood as a particular 
instance of the above definition in the case of null dis- 
persion of node degree. Here we allow some tolerance 
for the variability of the measurements (e.g. in order 
to cope with incompleteness or noise during the network 
construction). 

In order to characterize the local vertex structure, we 
considered the degree, clustering coefficient, neighboring 
degree and the locality index. The degree of each of any 
of its nodes i, henceforth abbreviated as fc(z), is defined 
as the number of immediate neighbors of i. The clus- 
tering coefficient of that node, represented as cc(i), is 
calculated by dividing the number of edges between the 
immediate neighbors of i, i.e. riEii), by the maximum 
possible number of connections between those nodes, i.e. 
nsii) = k(i){k{i) — l)/2. The average neighboring degree 
r{i) (e.g. of a node i, corresponds to the average of 
the degrees of the immediate neighbors of i. The locality 
index loc(i) of a node i corresponds to the ratio between 
the number of connections among the set of nodes com- 
prising i and its immediate neighbors divided by the sum 
of the edges connected to all those nodes Q . This mea- 
surement has been motivated by the matching index 
which is adapted here in order to refiect the 'internality' 
of the connections of all the immediate neighbors of a 
given reference node, instead of a single edge. 

In order to find the regular patches, the set of mea- 
surements defining homogeneity needs to be previously 
selected. Though such a set may include just the node 
degree (compatible with the traditional concept of regu- 
lar graphs), because we want to impose more strict de- 
mands on regularity it is necessary to consider additional 
features describing the network connectivity around each 
node. Each vertex i is represented by a vector with M 
measurements, Xi, which is normaHzed (3] in order to 
have zero means and unit standard deviation. Then, 
the network, which is represented by the set of such 
vectors X — {Xi,X2, ■ ■ ■ ,^7v} is projected into a two- 
dimensional space by considering principal component 
analysis -PCA (e.g. [3,S,0,[i)- Such a statistical map- 
ping implements a linear transformation (actually a rota- 
tion in the phase space) that ensures that the maximum 
dispersion of the points will be achieved along the initial 
projection axes (i.e. those corresponding to the largest 
absolute values of eigenvalues of the covariance matrix of 
the data). In addition, such a transformation optimally 
removes the redundancy of the data, which is fundamen- 
tal in our analysis since local measurements are known 
to be correlated in most real- world networks j^. 

Having chosen the set of measurements and defined the 
projection, it is necessary to identify those nodes which 
are characterized by small dispersion of the measure- 



ments. Vertices with similar topological features tend 
to be mapped close one another in the two-dimensional 
space. However, visual inspection can provide inaccurate 
results because of the form of the distribution of points in 
the projection. Thus, we estimated the probability den- 
sity in the 2D space by considering the non-parametric 
Parzen windows approach 0, This method involves 
convolving the feature vectors (represented as Dirac's 
deltas in the 2D projected space) with a two-variated 
Gaussian function (a normal distribution), allowing the 
interpolation of the probability density. In this way, high 
concentrations of points yield peaks in the probability 
density, which correspond to respective classes of vertices 
with homogeneous connectivity. 

After determining the probability density of the node 
measurements as mapped into the two-dimensional pro- 
jection, it is necessary to identify the obtained clusters 
of regular nodes. This is performed starting with the 
highest peak of the density. A cluster is created and as- 
sociated to this peak, and its value is assigned to a control 
variable V . The value of V is successively decreased and 
used to threshold the density, from which eventual new 
peaks are searched. In case a new peak appears, a new 
cluster is defined. Whenever two peaks merge, as a conse- 
quence of the progressive reduction of V , their respective 
clusters are subsumed, creating a branch in the hierarchi- 
cal structure of clusters. When V reaches its final value 
of 0, a tree representing the progressive merging of the 
peaks (and respective clusters) is obtained. The more 
significative clusters are identified by taking the clusters 
corresponding to the longest segments in the obtained 
tree. 

However, the nodes defining a cluster do not neces- 
sarily correspond to a regular patch, as they might not 
be connected in the original graph. Therefore, the last 
step in the regular patch detection corresponds to obtain- 
ing the connected subgraphs for each considered homo- 
geneous class in the distribution. Three indicators of the 
regularity of the network under analysis are considered 
in the current work: (i) the number of detected peaks 
P, (ii) the relative size of the maximum component iden- 
tified considering all detected peaks, and (iii) index of 
dispersion of the measurements inside the largest regu- 
lar region. The value of P defines the number of different 
structures that can be found in the network. These struc- 
ture can be thought of as generaHzed types of motifs, be- 
cause they do not have regular pre-defined structures 
but are statistically similar. In order to compare results 
obtained for networks with different sizes, we define the 
simplicity coefficient as the the ration between the num- 
ber of vertices in the largest connected region (5) and 
the network size {N), 
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(1) 



In addition, since the regularity can vary inside the 
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largest regular region, we can define a super-regularity 
coefficient. The elements of the first eigenvector associ- 
ated to the largest eigenvalue, Vi, provide the dispersion 
of each respective measurement. Therefore, the level of 
regularity of the region can be quantified in terms of the 
coefficient of variation of the elements of such eigenvec- 
tor, i.e. the super-regularity coefficient can be given by 
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(2) 



where (ul) is the average and <7y-{ is the standard devia- 
tion of v{ . It can be shown that < F < 1 . Note that 
networks presenting values of S and F close to one tend 
to be highly regular regarding all the considered mea- 
surements. 

We illustrate the above methodology by considering 
the network in Figure [2ja). First, a set of measure- 
ments is extracted and the vertices are projected into 
a two dimensional space. Note that the projection 
using principal component analysis is necessary only 
when more than two measurements are considered. 
We took into account two different configurations 
of measurements: (i) Mi = {k(i), cc{i)}, and (ii) 
M2 = {k{i),cc{i),r{i)Joc{i)}. In the first case, the 
two main regular regions are composed by the vertices 
Rl = {21,24,25,26,30,31,35,36,40,42,43,44} and 
R2 = {22,23,27,28,29,32,33,34,37,38,39}. While 
most of the vertices corresponding to region Rl belong 
to the border of the regular region of Figure [2]Ja) , the 
vertices of R2 are internal to that region. In the latter 
case, i.e. considering a larger set of measurements, the 
connected main region corresponds to the vertices R = 
{22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
40,41,42,43,44,45}, which forms the regular region 
presented in Figure Oa). Our analysis of network models 
and real-world networks took into account these four 
local measurements. 

In order to verify the importance of simple regions 
in networks under dynamic aspects, we considered the 
analysis of cascade failures. Cascade failures result in 
avalanche of breakdowns over the network when nodes 
and links are sensitive to overloading 12 , l3| . For a given 
network, a quantity of information (or energy) can be in- 
terchanged between pairs of nodes following the shortest 
paths distances at each time step. The capacity of a node 
i, d, is proportional to its initial load Li, d = {l + a)Li, 
which is the maximum load that i can handle. We rep- 
resent the load i by the betweenness centrality [l^. [l3|. 
When a single node is removed from the network, the 
dynamics of redistribution of fiows starts over the net- 
work and cascades can be triggered. Indeed, such re- 
movals change the shortest paths between nodes and, 
consequently, the distribution of the loads, creating over- 
loads on some nodes. For a = 0, it is guaranteed that at 
time t = no node is overloaded and the system is work- 
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FIG. 2: Illustration of the method to determine the simple 
patches in the network(a). (b) Distribution of the degree and 
cluster coefficient of each vertex and (c) the respective prob- 
ability density function. The PCA projection of the measure- 
ments M = {k{i),cc{i),r{i),loc{i)} into the two-dimensional 
space (d) and the respective probability density (e). The num- 
bers in (c) and (e) indicate the order of the peaks, from the 
highest to the lowest. 



ing properly. Larger values of a increase the capacity of 
nodes and reduce the chance of cascade breakdown. 

The analysis of cascade fails is performed by monitor- 
ing the avalanches when a single node in the regular and 
irregular regions is removed, independently. The damage 
caused by a cascade is quantified in terms of the relative 
size G of the largest component, G = Nf/N, where Nf 
is the size of the largest component after the avalanche. 

Figure[3]shows the value of G when a single vertex is re- 
moved in the largest simple and non-simple (the remain- 
der of the network) regions for the Erdos-Renyi (ER), 
Knitted (KT) [Hi, Barabasi- Albert (BA) and Krapivsky- 
Redner (NL) (based on non-linear preferential attach- 
ment, Pi^j = k'^-^ / J2u^u^) theoretical network 
models. Each point in the scatterplot corresponds to 
an average of the relative size of the resulting compo- 
nent G after removing of each vertex i in the simple and 
non-simple regions. 

As we can see, the removal of vertices in the simple re- 
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TABLE I: The simplicity in different networks. 



a 



FIG. 3: Cascading failure in BA (circles) and NL (squares) 
network models, and in ER (stars) and KT (triangles) (in the 
inset). The removal of vertices in the simple regions are rep- 
resented by white symbols and in the heterogeneous regions 
by black symbols. Each curve corresponds to an average of 
G after removing every vertex in each region, while the errors 
bars represent the standard deviations. 



gion tends to cause smaller damage on the network than 
the removal of vertices in the heterogeneous region for all 
models. This property had already been observed con- 
sidering the whole network, in which homogenous net- 
works tend to be more robust under fails and target at- 
tacks l3, Therefore, the simpler regions are funda- 
mental to network robustness. In addition, the curves 
obtained for models with similar structures, as BA e NL, 
present similar behaviors. The same happens with the 
ER and KT network models — the latter corresponding 
to the most regular model in our analysis which is 
reflected in Table [B 

The protein databases were obtained from the Biogrid 
repository for protein interactions The analysis 

of the protein-protein interaction networks of the pro- 
gressively more evolved species Sacharomyces cerevisiae, 
Drosophila melanogaster and Homo sapiens allowed us 
to investigate how the simplicity of the connections has 
changed during evolution under natural selection. As we 
can see in Table [H the level of simplicity clearly increases 
with evolution — the protein interaction network of the 
H. sapiens is the most regular. The super-regularity co- 
efficient also increases with the complexity of the organ- 
ism. A possible explanation of this remarkable phenom- 
ena is related to protein evolution. Since hubs tend to 
evolve more slowly than less connected proteins, because 
they are more important in the organism, the addition of 
new connections due to mutation and duplication tend 
to favor the proteins that are not hubs, therefore increas- 
ing regularity [l3|. This process impHes that more ro- 
bust networks are obtained for more complex organisms. 
Considering the cascade effect on the protein interaction 
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networks, we observed that the removal of vertices in reg- 
ular region tends to be less destructive than removal in 
the non-regular region, as also observed for the BA and 
NL network models in Figure [3l 

While analyzing the effect of the regularity on the 
largest regular regions in BA, NL and the three protein- 
protein interaction networks, which are scale-free net- 
works, we observed that there is a positive correlation 
between the super-regularity coefficient and the aver- 
age of the relative size G of the largest component for 
< a < 0.1 — the obtained Pearson coefficient is 0.7. 
This indicates that the more regular the regions, the more 
robust they are under removal of vertices in such regions. 
Moreover, we investigated the effect of the variation of 
each four considered measurements (fc(i), cc(i), r(i) and 
loc{i)) and observed that each of them is correlated to the 
average of the relative size G. Therefore, we can conclude 
that the smaller the variation of the local measurements 
inside the largest regular region, the more robust the net- 
work is with respect to the cascade dynamics. This is a 
consequence of the homogeneous distribution of short- 
est paths in such regions — as the vertices tend to have 
similar properties, their betweenness centrality becomes 
similar. This effect was identified in every considered 
networks. Indeed, vertices with the highest betweenness 
tend to be outside the regular regions, since such vertices 
generally present distinct local properties (such as the 
hubs). These vertices tend to be the outliers in the PC A 
projection. 

The reported method for identifying regular subgraphs 
(or patches) within complex networks, as well as the re- 
spectively obtained results, illustrated and corroborated 
the importance of considering patchwise regularity in or- 
der to characterize and obtain insights about the prop- 
erties of theoretical and real- world networks. Several are 
the possibilities for further investigations opened by this 
work, which include but are not limited to the considera- 
tion of other real- world networks, selection of additional 
topological measurements, and analysis of evolution of 
simplicity in some biological networks, as the brain, food 
webs and genetic networks. In addition, our methodol- 
ogy can be applied to the identification of more general 
types of motifs, defined by similar structures. These mo- 
tifs may not have fully regular structure, as defined cur- 
rently but be characterized by statistically uniform 
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properties. 
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